We are going to use the Scarr-Rowe effect (or hypothesis) to look more closely on interaction effects. The Scarr-Rowe hypothesis states that the (genetic) heritability of a trait depends on the environment, such that the effects of genes are larger when environments are better. The underlying idea is that if everyone lives in a perfect environment, i.e. there is no variation in the relevant environment, then a trait will only depend on genes.
This interaction can be visualized as follows:
par(mar=c(3,3,.5,.5), mgp=c(1.75,.5,0), tck=-.01)
heritability = c(scarce = 0.5, plentiful = 0.8)
barplot(heritability,
ylim = c(0,1),
xlab = "environment",
ylab = "heritability (effect of genes)")
Here is a DAG that describes such a model, where
and the Scarr-Rowe effect means the the coefficient of the path \(\small A \rightarrow IQ\) depends on \(\small SES\).
library(dagitty)
dag = dagitty(
"dag{
IQ;
A -> IQ;
C -> IQ;
E -> IQ;
SES -> IQ;
}")
coord.list =
list(
x=c(A=1,C=2,E=3,SES=2,IQ=2),
y=c(A=-1,C=-1,E=-1,SES=1,IQ=0))
coordinates(dag) = coord.list
drawdag(dag, cex = 1.5)
Note that DAGs do not encode interaction effects by drawing an arrow from the moderator to the relevant path. Instead, there is a path from the moderator to the outcome variable. So the DAG only tells us that IQ is a function of four other variables \(\small IQ = f(A,C,E,SES)\), but it does not tell us what the function \(f()\) is.
This could be
or any other imaginable function that uses the variables. The model we have in mind now is:
\[ IQ = f_1(SES) A + f_2(SES) C + E \]
Lets simulate data that show the the Scarr-Rowe effect, first our exogenous variables. To keep things simple, we assume that all variables are normally distributed:
set.seed(3)
N = 250
SES = rnorm(N,mean = 5,1)
A = rnorm(N,mean = 10,2)
C = rnorm(N,mean = 10,2)
E = rnorm(N, sd = 7)
Now comes the interesting part: Scarr-Rowe assumes that the effect or weight of genes depends on the SES. So we formulate the weight of genes as as a function of SES. For good measure, we also let the effect of the familial environment vary by SES.
library(boot)
# we use inv.logit keep to give weights a lower and upper bound
b_A = function(SES) 1 + inv.logit(SES-5)*3
b_C = function(SES) inv.logit(-SES+5)
We are literally defining the weight for A as a function of SES. This is one way to understand interactions. Here is a visualization of the weights:
par(mar=c(3,3,.5,.5), mgp=c(1.75,.5,0), tck=-.01)
curve(b_A(x),2,8, y = expression("effect size"~beta),
xlab = "SES", ylim = c(0,4), col = "red")
curve(b_C(x),2,8, xlab = "SES", add = T, col = "blue")
text(c(5,5),
c(b_A(5), b_C(5)),
labels = c(expression(beta[A]),
expression(beta[C])), pos = 3)
In this plot, the interaction is not encoded by the fact that \(\small \beta_A\) and \(\small \beta_C\) have different slopes, but
by the fact that both have a slope in the first place. The figure shows
two interactions, because the effect size for \(\small A\) and \(\small C\) changes with SES.
And now we can simulate IQ values using exogenous variables and derived weights. We assume that IQ depends on familial factors \(\small C\), additive genetic effects \(\small A\): \(\small IQ = f_1(SES) A + f_2(SES) C + E\). Lets simulate data from this model:
# Adding 70 to get an IQ around 100
IQ = 70 + b_C(SES)*C + b_A(SES)*A + E
hist(IQ)
If we just look at the bivariate associations between \(\small A\) or \(\small SES\) and \(\small IQ\), we get the following plot:
par(mfrow = c(1,2),mar=c(3,3,1,.5), mgp=c(1.75,.5,0), tck=-.01)
plot(SES,IQ, pch = 16, cex = .5,)
#abline(lm(IQ~SES))
plot(A,IQ, pch = 16, cex = .5,)
#abline(lm(IQ~A))
# plot(C,IQ)
# abline(lm(IQ~C))
To run a simple interaction with a categorical interaction variable
we use only a subset of our sample, the top and lower 30%, and code a
categorical SES variables SES.c:
low.SES = which(SES < quantile(SES,.30))
high.SES = which(SES > quantile(SES,.70))
idx = c(low.SES,high.SES)
dt = data.frame(
A = A[idx],
C = C[idx],
IQ = IQ[idx],
SES = SES[idx],
SES.c = c(rep(1,length(low.SES)),
rep(2,length(high.SES))))
Lets plot the association between \(\small A\) and \(\small IQ\) again, this time split by SES:
low.SES = dt$SES.c == 1
high.SES = dt$SES.c == 2
plot_data = function(dt) {
par(mar=c(3,3,1,.5), mgp=c(1.75,.5,0), tck=-.01)
plot(IQ ~ A, data = dt, col = dt$SES.c, pch = 16, cex = .5,
ylim = range(dt$IQ), ylab = "IQ", xlab = "A")
legend("topleft", pch = 16, col = c("black","red"),
legend = c("low SES","high SES"), bty = "n")
}
plot_data(dt)
#abline(lm(IQ ~ A, data = dt[low.SES,]), col = "black")
#abline(lm(IQ ~ A, data = dt[high.SES,]), col = "red")
Next, we estimate some linear regression models with increasing complexity, our goal being to have a model that accurately depicts the effect of A, C, E, and SES on the IQ.
We start with a simple linear regression. But first some intuitions on priors:
a ~ dnorm(100,5)a ~ dnorm(0,4). This prior allows for the possibility of a
negative effect of \(\small A\). Note
that for this prior to workdexp(0.25)IQ.A =
quap(
alist(
IQ ~ dnorm(mu,sigma),
mu <- a + b*(A - 10),
a ~ dnorm(100,5),
b ~ dnorm(0,4),
sigma ~ dexp(.5)
),
data=dt)
Lets quickly look at the prior predictions to make sure the piors are OK.
prior = extract.prior(IQ.A)
A.seq = seq(from=0,to=20,length.out=50)
mu = link(
IQ.A,
post=prior,
data=data.frame(A=A.seq))
par(mar=c(3,3,1,.5), mgp=c(1.75,.5,0), tck=-.01)
plot(
0,type = "n", ylab = "IQ", xlab = "A",
xlim = c(min(A),max(A)),
ylim = c(70,130))
matlines(
A.seq,t(mu[1:100,]),type = "l", lty = 1,
col = adjustcolor("blue", alpha = .5))
Yes, this looks good.
Here are the posterior predictions:
plot_data(dt)
plot_mu.CIs(IQ.A, data.frame(A=A.seq), "blue", spaghetti = TRUE)
To fit a model with a main effect of education, we use the indexing approach:
set.seed(1)
IQ.A_SES =
quap(
alist(
IQ ~ dnorm(mu,sigma),
mu <- a[SES.c] + b*(A - 10),
a[SES.c] ~ dnorm(100,5),
b ~ dnorm(0,4),
sigma ~ dexp(.5)
),
data=dt)
plot_data(dt)
plot_mu.CIs(IQ.A_SES, data.frame(A=A.seq, SES.c = 1), "black")
plot_mu.CIs(IQ.A_SES, data.frame(A=A.seq, SES.c = 2), "red")
We can see already from the plot the the model with separate means for high and low SES is better. But here is the comparison with PSIS-LOO:
compare(
IQ.A,IQ.A_SES,
func = "PSIS") %>%
round(2)
## PSIS SE dPSIS dSE pPSIS weight
## IQ.A_SES 1014.17 14.62 0.00 NA 4.07 1
## IQ.A 1077.78 15.52 63.61 12.53 3.13 0
Lastly, we can also let the slopes vary by SES:
IQ.AxSES =
quap(
alist(
IQ ~ dnorm(mu,sigma),
mu <- a[SES.c] + b[SES.c]*(A - 10),
a[SES.c] ~ dnorm(100,5),
b[SES.c] ~ dnorm(0,2),
sigma ~ dexp(.5)
),
data=dt)
And again the posterior predictions:
par(mar=c(3,3,2,.5), mgp=c(1.75,.5,0), tck=-.01)
plot_data(dt)
plot_mu.CIs(IQ.AxSES, data.frame(A=A.seq, SES.c = 1), "black")
plot_mu.CIs(IQ.AxSES, data.frame(A=A.seq, SES.c = 2), "red")
Even though the DGP does not have different “intercepts” for high and log SES, we have to add it, because our model puts the intercept at A = 10, whereas it was A = 0 in the DGP.
Here is a comparison of all the models we have fit:
compare(
IQ.A, IQ.A_SES, IQ.AxSES,
func = "PSIS") %>%
round(2)
## PSIS SE dPSIS dSE pPSIS weight
## IQ.AxSES 1007.85 16.20 0.00 NA 5.22 0.97
## IQ.A_SES 1014.70 14.69 6.85 6.79 4.30 0.03
## IQ.A 1077.95 15.61 70.11 14.00 3.20 0.00
We get a warning about Pareto k values being higher than 0.5, but it is (more or less) OK if k is not larger than 0.7 or 1.
So how can we figure out if the difference in slopes is reliably larger than 0? We simply extract posteriors and make some calculations with them:
post = extract.samples(IQ.AxSES)
names(post)
## [1] "sigma" "a" "b"
head(post$b)
## [,1] [,2]
## [1,] 1.1506010 2.980423
## [2,] 1.4792998 2.571067
## [3,] 1.1977443 2.863966
## [4,] 0.8793765 2.514056
## [5,] 1.0415250 3.043657
## [6,] 1.6412887 2.227408
We simply calculate the differences of the two b parameters:
delta.b = post$b[,2]-post$b[,1]
And now we can show e.g. mean and PIs:
c(mean = mean(delta.b),
PI(delta.b,prob = c(.9)))
## mean 5% 95%
## 1.548680 0.700257 2.401483
And we can plot a histogram of the contrast:
hist(delta.b, xlim = range(c(0,delta.b)))
abline(v = 0, lty = 2)
abline(v = PI(delta.b,prob = c(.95)), col = "red")
To see if our results are reasonable, lets compare our estimated parameters with the parameters governing the DGP. First the model parameters:
precis(IQ.AxSES, depth = 2) %>% round(2)
## mean sd 5.5% 94.5%
## a[1] 95.40 0.78 94.16 96.64
## a[2] 105.25 0.76 104.03 106.47
## b[1] 1.34 0.36 0.76 1.92
## b[2] 2.89 0.37 2.29 3.48
## sigma 6.65 0.38 6.05 7.25
Now the parameters from the DGP:
low.SES = which(SES < quantile(SES,.30))
high.SES = which(SES > quantile(SES,.70))
rbind(
b.low.SES = b_A(SES[low.SES]) %>% mean(),
b.high.SES = b_A(SES[high.SES]) %>% mean())
## [,1]
## b.low.SES 1.760083
## b.high.SES 3.275656
We are not recovering the exact parameters, after all we used a golem instead of a model that depicts the DGP, but qualitatively the results are consistent with the DGP.
Earlier, we described this function.
\[ IQ = f_1(SES) A + ... \]
By saying that the effect \(\small A\) is a function of \(\small SES\). However, we really just have to variables: \(\small A\) and \(\small f_1(SES)\) which are multiplied with each other. Therefore, these statements are equivalent:
Accordingly, we can also plot the difference between high and low \(\small SES\) as a function of \(\small A\):
A.seq = seq(from=5,to=16,length.out=30)
mu.low = link(IQ.AxSES,data=data.frame(SES.c=1,A=A.seq))
mu.high = link(IQ.AxSES,data=data.frame(SES.c=2,A=A.seq))
delta = mu.high-mu.low
CIs = apply(delta,2,PI)
par(mar=c(3,3,2,.5), mgp=c(1.75,.5,0), tck=-.01)
plot(A.seq, colMeans(delta),'l',
ylim = range(CIs),
col = "blue",
xlab = "A",
ylab = expression(IQ[high~SES]~-~IQ[low~SES]))
shade(CIs,A.seq, col = adjustcolor("blue", alpha = .25))
abline(h = 0, lty = 2)
text(12,0,"higher IQ with lower SES", pos = 1, adj = 0)
text(12,0,"higher IQ with higher SES", pos = 3, adj = 0)
We are continuing with our simulated Scarr-Rowe data set, but this time we use the full data and formulate a continuous interaction.
dt = data.frame(
A = A,
C = C,
IQ = IQ,
SES = SES)
You can try it the fancy way and make a 3d plot, but in this instance its a lot of effort for meager results:
library(plot3D)
points3D(A,SES,IQ, type = "h",
col = "black",
cex = .75,
lty = 3,
pch = 16,
phi = 20,
theta = 45,
xlab = "A",
ylab = "SES",
zlab = "IQ")
A panel of 2d plots does the job better, and will later allow to display uncertainty.
qs = quantile(SES, probs = seq(0,1,.25))
par(mfrow = c(1,4), mar=c(2.5,2.5,2,.5), mgp=c(1.5,.5,0), tck=-.01)
for (k in 2:length(qs)) {
idx = which(dt$SES > qs[k-1] & dt$SES < qs[k])
tmp.dt = dt[idx,]
with(tmp.dt,
plot(IQ~A, pch = 16, main = paste0(k-1,". quantile SES"),
ylim = range(dt$IQ), xlim = range(dt$A)))
}
Following the specification of our DGP, we can now describe
interaction in the quap model as a simple
multiplication:
IQc.A_SES =
quap(
alist(
IQ ~ dnorm(mu,sigma),
mu <- a + ba*(A - 10) + bs*(SES - 2.5),
a ~ dnorm(100,10),
ba ~ dnorm(0,4),
bs ~ dnorm(0,4),
sigma ~ dexp(.5)
),
data=dt)
We are using prior predictions to check if the model does a good job:
plot.pred(IQc.A_SES,dt, type = "prior")
This is pretty wild. Lets try a bit narrowers priors:
set.seed(1)
IQc.A_SES =
quap(
alist(
IQ ~ dnorm(mu,sigma),
mu <- a + ba*(A - 10) + bs*(SES - 2.5),
a ~ dnorm(100,7.5),
ba ~ dnorm(0,2),
bs ~ dnorm(0,2),
sigma ~ dexp(.5)
),
data=dt)
plot.pred(IQc.A_SES,dt, type = "prior")
These priors are reasonable, so we look at the predictions:
plot.pred(IQc.A_SES,dt)
As expected from the model we specified, all slopes are the same.
Note that the different “intercepts” we seem to come from this part of
the model: mu <- ... + bs*(SES - 2.5),.
The model is similar to the previous, but adds this interaction term:
bsa*(A - 10)*(SES - 2.5).
IQc.AxSES.m =
quap(
alist(
IQ ~ dnorm(mu,sigma),
mu <- a + ba*(A - 10) + bs*(SES - 2.5) + bsa*(A - 10)*(SES - 2.5),
a ~ dnorm(100,7.5),
ba ~ dnorm(0,2),
bs ~ dnorm(0,2),
bsa ~ dnorm(0,1),
sigma ~ dexp(.5)
),
data=dt)
plot.pred(IQc.AxSES.m,dt, type = "prior")